Search CORE

17 research outputs found

Robust Estimators are Hard to Compute

Author: Bernholt Thorsten
Publication venue
Publication date
Field of study

In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem. Robustness means that the estimation is not or only slightly affected by outliers in the data. In this paper, it is shown that the following robust estimators are hard to compute: LMS, LQS, LTS, LTA, MCD, MVE, Constrained M estimator, Projection Depth (PD) and Stahel-Donoho. In addition, a data set is presented such that the ltsReg-procedure of R has probability less than 0.0001 of finding a correct answer. Furthermore, it is described, how to design new robust estimators. --Computational statistics,complexity theory,robust statistics,algorithms,search heuristics

Research Papers in Economics

Repeated median and hybrid filters

Author: Bernholt Thorsten
Fried Roland
Gather Ursula
Publication venue
Publication date
Field of study

Standard median filters preserve abrupt shifts (edges) and remove impulsive noise (outliers) from a constant signal but they deteriorate in trend periods. FIR median hybrid (FMH) filters are more flexible and also preserve shifts, but they are much more vulnerable to outliers. Application of robust regression methods, in particular of the repeated median, has been suggested for removing subsequent outliers from a signal with trends. A fast algorithm for updating the repeated median in linear time using quadratic space is given in Bernholt and Fried (2003). We construct repeated median hybrid filters to combine the robustness properties of the repeated median with the edge preservation ability of FMH filters. An algorithm for updating the repeated median is presented which needs only linear space. We also investigate analytical properties of these filters and compare their performance via simulations. --Signal extraction,Drifts,Jumps,Outliers,Update algorithm

Research Papers in Economics

Modified repeated median filters

Author: Bernholt Thorsten
Fried Roland
Gather Ursula
Wegner Ingo
Publication venue
Publication date
Field of study

We discuss moving window techniques for fast extraction of a signal comprising monotonic trends and abrupt shifts from a noisy time series with irrelevant spikes. Running medians remove spikes and preserve shifts, but they deteriorate in trend periods. Modified trimmed mean filters use a robust scale estimate such as the median absolute deviation about the median (MAD) to select an adaptive amount of trimming. Application of robust regression, particularly of the repeated median, has been suggested for improving upon the median in trend periods. We combine these ideas and construct modified filters based on the repeated median offering better shift preservation. All these filters are compared w.r.t. fundamental analytical properties and in basic data situations. An algorithm for the update of the MAD running in time O(log n) for window width n is presented as well. --signal extraction,robust filtering,drifts,jumps,outliers,computational geometry,update algorithm

Research Papers in Economics

Computing the Least Quartile Difference Estimator in the Plane

Author: Bernholt Thorsten
Nunkesser Robin
Schettlinger Karen
Publication venue
Publication date
Field of study

A common problem in linear regression is that largely aberrant values can strongly influence the results. The least quartile difference (LQD) regression estimator is highly robust, since it can resist up to almost 50% largely deviant data values without becoming extremely biased. Additionally, it shows good behavior on Gaussian data – in contrast to many other robust regression methods. However, the LQD is not widely used yet due to the high computational effort needed when using common algorithms, e.g. the subset algorithm of Rousseeuw and Leroy. For computing the LQD estimator for n data points in the plane, we propose a randomized algorithm with expected running time O(n2 log2 n) and an approximation algorithm with a running time of roughly O(n2 log n). It can be expected that the practical relevance of the LQD estimator will strongly increase thereby. --

Research Papers in Economics

Constrained Minkowski Sums: A Geometric Framework for Solving Interval Problems inComputational Biology Efficiently

Author: Bernholt Thorsten
Eisenbrand Friedrich
Hofmeister Thomas
Publication venue
Publication date: 18/06/2018
Field of study

In this paper, we introduce the notion of a constrained Minkowski sum: for two (finite) point-sets P,Q⊆ℝ2 and a set of k inequalities Ax≥b, it is defined as the point-set (P ⊕ Q) Ax≥b ={x=p+q∣p∈P,q∈Q,Ax≥b}. We show that typical interval problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O(Nlog N), where N=|P|+|Q| if k is fixed. For the special case

(P\oplus Q)_{x_{1}\geq \beta}

where P and Q consist of points with integer x 1-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many interval problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of interval problems can be modeled and solve

RERO DOC Digital Library

Detecting high-order interactions of single nucleotide polymorphisms using genetic programming

Author: Bernholt Thorsten
Ickstadt Katja
Nunkesser Robin
Schwender Holger
Wegener Ing
Publication venue
Publication date
Field of study

Motivation: Not individual single nucleotide polymorphisms (SNPs), but high-order interactions of SNPs are assumed to be responsible for complex diseases such as cancer. Therefore, one of the major goals of genetic association studies concerned with such genotype data is the identification of these high-order interactions. This search is additionally impeded by the fact that these interactions often are only explanatory for a relatively small subgroup of patients. Most of the feature selection methods proposed in the literature, unfortunately, fail at this task, since they can either only identify individual variables or interactions of a low order, or try to find rules that are explanatory for a high percentage of the observations. In this paper, we present a procedure based on genetic programming and multi-valued logic that enables the identification of high-order interactions of categorical variables such as SNPs. This method called GPAS (Genetic Programming for Association Studies) cannot only be used for feature selection, but can also be employed for discrimination. Results: In an application to the genotype data from the GENICA study, an association study concerned with sporadic breast cancer, GPAS is able to identify high-order interactions of SNPs leading to a considerably increased breast cancer risk for different subsets of patients that are not found by other feature selection methods. As an application to a subset of the HapMap data shows, GPAS is not restricted to association studies comprising several ten SNPs, but can also be employed to analyze whole-genome data. --

Research Papers in Economics

Constrained Minkowski Sums: A Geometric Framework for Solving Interval Problems in Computational Biology Efficiently

Author: Bernholt Thorsten
Eisenbrand Friedrich
Hofmeister Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/11/2010
Field of study

In this paper, we introduce the notion of a constrained Minkowski sum: for two (finite) point-sets P, Q subset of R-2 and a set of k inequalities Ax >= b, it is defined as the point-set (P circle plus Q)(Ax >= b) = {x = p + q vertical bar p is an element of P, q is an element of Q, Ax >= b}. We show that typical interval problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O (N log N), where N = vertical bar P vertical bar + vertical bar Q vertical bar if k is fixed. For the special case (P circle plus Q)(x1 >=beta) where P and Q consist of points with integer x(1)-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many interval problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of interval problems can be modeled and solved

Infoscience - École polytechnique fédérale de Lausanne

Computing the least median of squares estimator in time O(n d

Author: Thorsten Bernholt
Publication venue
Publication date: 01/01/2005
Field of study

In modern statistics, the robust estimation of parameters of a regression hyperplane is a central problem, i. e., an estimation that is not or only slightly affected by outliers in the data. In this paper we will consider the least median of squares (LMS) estimator. For n points in d dimensions we describe a randomized algorithm for LMS running in O � n d � time and O(n) space, for d fixed, and in time O � d 3 · (2n) d � and O(dn) space, for arbitrary d

CiteSeerX

Algorithms, Theory

Author: Friedrich Eisenbrand
Thomas Hofmeister
Thorsten Bernholt
Publication venue
Publication date
Field of study

In this paper, we introduce the notion of a constrained Minkowski sum which for two (finite) point-sets P, Q ⊆ R 2 and a set of k inequalities Ax ≥ b is defined as the point-set (P ⊕Q)Ax≥b = {x = p+q | p ∈ P, q ∈ Q, Ax ≥ b}. We show that typical subsequence problems from computational biology can be solved by computing a set containing the vertices of the convex hull of an appropriately constrained Minkowski sum. We provide an algorithm for computing such a set with running time O(N log N), where N = |P | + |Q | if k is fixed. For the special case (P ⊕ Q)x1≥β, where P and Q consist of points with integer x1-coordinates whose absolute values are bounded by O(N), we even achieve a linear running time O(N). We thereby obtain a linear running time for many subsequence problems from the literature and improve upon the best known running times for some of them. The main advantage of the presented approach is that it provides a general framework within which a broad variety of subsequence problems can be modeled and solved. This includes objective functions and constraints which are even more complex than the ones considered before

CiteSeerX

Repeated median and hybrid filters

Author: Bernholt Thorsten
Fried Roland
Gather Ursula
Publication venue
Publication date
Field of study

Research Papers in Economics